Latent Diffusion Bridges for Unsupervised Timbre Transfer DemoΒΆ

This demo page is for the paper Latent Diffusion Bridges for Unsupervised Timbre Transfer

Source code: link

Table of ContentsΒΆ

  1. Timbre Transfer Results
    1.1. Normal Instruments Created with Our Method
    1.2. Pitch-Shifted Flute
    1.3. Chunk-Based Minibatch

  2. Impact of Different Sigma Max and Sigma N

  3. Shared Space

  4. Cycle Consistency

Timbre Transfer ResultsΒΆ

Normal Instruments Created with Our MethodΒΆ

Source Target
flute
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.07, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
trumpet
DPD: 0.05, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.1, JD: 0.2
Your browser does not support the audio element.
No description has been provided for this image
trumpet
DPD: 0.13, JD: 0.1
Your browser does not support the audio element.
No description has been provided for this image
trumpet
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.02, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.02, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
bassoon
Your browser does not support the audio element.
No description has been provided for this image
cello
DPD: 0.12, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
cello
Your browser does not support the audio element.
No description has been provided for this image
bassoon
DPD: 0.07, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image

Pitch-ShiftedΒΆ

Source Target
flute shifted -20 semitones
Your browser does not support the audio element.
No description has been provided for this image
bassoon
DPD: 0.21, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
flute shifted -25 semitones
Your browser does not support the audio element.
No description has been provided for this image
bassoon
DPD: 0.6, JD: 0.25
Your browser does not support the audio element.
No description has been provided for this image

Chunk-Based MinibatchΒΆ

Source Target
flute
model trained with time chunk size 4 and channel chunk size 0
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.12, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
flute
model trained with time chunk size 4 and channel chunk size 32
Your browser does not support the audio element.
No description has been provided for this image
violin
DPD: 0.2, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
model trained with time chunk size 4 and channel chunk size 0
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.09, JD: 0.0
Your browser does not support the audio element.
No description has been provided for this image
violin
model trained with time chunk size 4 and channel chunk size 32
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.13, JD: 0.1
Your browser does not support the audio element.
No description has been provided for this image

Impact of Different Sigma Max and Sigma NΒΆ

Source Noise Target
violin
model with sigma_max=100 and sigma_N=100
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image
Noisy violin
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 2.39, JD: 0.64
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image
Source Noise Target
violin
model with sigma_max=100 and sigma_N=5
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image
Noisy violin
Your browser does not support the audio element.
No description has been provided for this image
flute
DPD: 0.12, JD: 0.1
Your browser does not support the audio element.
No description has been provided for this image
No description has been provided for this image

Shared SpaceΒΆ

The following audio samples were generated using flute and violin models, both with sigma_max=100 and sigma_N=100, by sampling directly from N(0, sigma_max). Below, we provide examples of audio pairs that were considered melodically similar and those that were not.

Flute Violin
Similar Melodies (DPD < 0.7)
Your browser does not support the audio element.
No description has been provided for this image
DPD: 0.52, JD: 0.18
Your browser does not support the audio element.
No description has been provided for this image
Different Melodies (DPD >= 0.7)
Your browser does not support the audio element.
No description has been provided for this image
DPD: 1.77, JD: 0.25
Your browser does not support the audio element.
No description has been provided for this image

Cycle ConsistencyΒΆ